EN FR
EN FR


Section: New Results

Ontology-based data and document Management

Participants : Meghyn Bienvenu, François Goasdoué, Yassine Mrabet, Nathalie Pernelle, Gianluca Quercini, Chantal Reynaud, Brigitte Safar, Fabian Suchanek.

Semantic Annotation

We have started a work on semantic annotation of public administration data in the setting of the project DataBridges, an ICT Labs activity. We considered public data represented in tables. The tables that we studied were tables created and published by INSEE. They are spreadsheets filled with statistics about geographic locations and are usually composed of multiple columns, of which one, that we term the subject column, contains a list of textual references to geographic entities, or toponyms, while the others contain numeric attributes. We proposed an approach and an algorithm that assigns a type, or header to the subject column of a INSEE table and identifies the geographic entities referred to by the toponyms in the column  [64] . An external resource, DBpedia, is used to help to disambiguate the entities mentioned in the tables and a domain ontology ensures that the types are relative to the geographic domain. This work is continued in the setting of a post-doctoral work granted by the ANR project DataBridges. The aim of the project being to enrich a data warehouse, a first work is to automatically build an initial RDF data warehouse from data collected from the web.

Adaptive Ontologies for Information Retrieval

We published the approach supported by the TARGET framework for Web Information Retrieval in the International Journal of Web Portals (IJWP)  [22] . This approach was the core of the PhD of Cédric Pruski defended in April 2009.

Querying ontology-based annotations

We have pursued our work on integrating knowledge bases and semantic annotations made on more or less structured tagged documents . We have defined an approach where RDF named graphs are used to distinguish uncertain semantic annotations from rdf triples that are provided by the populated ontology. A user domain query is then reformulated to obtain answers that are ranked according to their provenance (knowledge bases or annotations)  [61] .

Watermarking for ontologies

Ontologies are usually available under some type of license. The large ontologies of the Semantic Web, e.g., are commonly licensed under a Creative Commons License or a GNU license. These licenses require giving credit to the authors of the ontology if the ontology is ever used somewhere else. However, it can be hard to prove whether an ontology is used somewhere else, because ontologies contain world knowledge. If someone “steals” an ontology and uses it somewhere else, he can always claim that he collected the data by himself from real-world sources. To tackle this problem, we have studied approaches that watermark an ontology  [43] . If a watermarked ontology is used somewhere else, the mark proves that the ontology has been stolen. Existing approaches have mainly modified the facts in the ontology to create a mark. This, however, compromises the precision of the ontology. Therefore, we have developed an approach that does not modify, but remove certain facts. Thereby, the precision of the ontology is not affected. We show that only a handful of facts have to be removed from an ontology to protect it against theft.

Consistent query answering in DL-Lite

An important problem which arises in ontology-based data access is how to handle inconsistencies. In the database community, the related problem of querying databases which violate integrity constraints has been extensively studied under the name of consistent query answering. The standard approach is based on the notion of a repair, which is a database which satisfies the integrity constraints and is as similar as possible to the original database. Consistent answers are defined as those answers which hold in all repairs. A similar strategy can be used for description logics by replacing the integrity constraints with the ontology. Unfortunately, recent work on consistent query answering in description logics has shown this problem to be co-NP-hard in data complexity, even for instance queries and the simplest DL-Lite dialect. In light of this negative result, we considered the problem of identifying cases where consistent query answering is feasible, and in particular, can be done using query rewriting, with the aim of better understanding the cases in which query rewriting can be profitably used. In [51] , we make some first steps towards this goal by formulating general conditions which can be used to prove that a consistent rewriting does or does not exist for a given DL-Lite TBox and instance query.

Module-based data management in DL-lite

The current trend for building an ontology-based data management system (DMS) is to capitalize on efforts made to design a preexisting well-established DMS (a reference system). The method amounts to extract from the reference DMS a piece of schema relevant to the new application needs – a module –, possibly to personalize it with extra-constraints w.r.t. the application under construction, and then to manage a dataset using the resulting schema. We have revisited the reuse of a reference ontology-based DMS in order to build a new DMS with specific needs. We go one step further by not only considering the design of a module-based DMS (i.e., how to extract a module from a ontological schema): we also study how a module-based DMS can benefit from the reference DMS to enhance its own data management skills. We consider the setting of the DL-Lite A dialect of DL-Lite, which encompasses the foundations of the QL profile of OWL2 (i.e., DL-Lite R ): the W3C recommandation for managing efficiently large datasets. We introduce and study novel properties of robustness for modules that provide means for checking easily that a robust module-based DMS evolves safely w.r.t. both the schema and the data of the reference DMS. From a module robust to consistency checking, for any data update in a corresponding module-based DMS, we show how to query the reference DMS for checking whether the local update does not bring any inconsistency with the data and the constraints of the reference DMS. From a module robust to query answering, for any query asked to a module-based DMS, we show how to query the reference DMS for obtaining additional answers by also exploiting the data stored in the reference DMS.